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Abstract 

For a given discrete decomposable graphical model, we identify several al- 
ternative parametrizations, and construct the corresponding reference priors for 
suitable groupings of the parameters. Specifically, assuming that the cliques of 
the graph are arranged in a perfect order, the parameters we consider are con- 
ditional probabilities of clique-residuals given separators, as well as generalized 
log-odds-ratios. We also consider a parametrization associated to a collection 
of variables representing a cut for the statistical model. The reference priors 
we obtain do not depend on the order of the groupings, belong to a conjugate 
family, and are proper. 



Some key words: Clique; Conjugate family; Contingency table; Cut; Log- 
linear model; Multinomial model; Natural exponential family. 

1 Introduction 

Graphical models, see e.g. Lauritzen (1996), are statistical models such that depen- 
dencies between variables are expressed by means of a graph. The study of graphical 
models is an established and active area of applied and theoretical research. Directed 
graphs for discrete variables, often called Bayesian networks, see e.g. Cowell et al. 
(1999), have been used in a variety of applied domains, and represent the engine of 
probabilistic expert systems. On the other hand, undirected graphical models for the 
analysis of discrete data are best employed for the analysis of multi-way contingency 
tables, and represent a useful subset of hierarchical log-linear models . 

In this paper we are concerned with the Bayesian analysis of discrete undirected 
graphical models, whose underlying graph is decomposable. When working in a 
Bayesian framework, a prior distribution on the parameter space is required. Pri- 
ors for undirected discrete graphical, or more generally, log-linear models have been 
considered in Dawid and Lauritzen (1993), Madigan and York (1995), Dellaportas 
and Forster (1999), Kings and Brooks (2001), Dellaportas and Tarantola (2005). 

Despite the adoption of reasonably simplified models, prior elicitation still repre- 
sents a major concern even for moderately large graphs, because of the very high num- 
ber of parameters involved. This naturally suggests to search for default, or objective, 
priors, requiring a minimal subjective input and essentially model-based. However 
there is now evidence, see e.g. Berger (2000) and Casella (1996), that naive approaches 
based on flat non-informative priors are largely inadequate in multi-parameter set- 
tings. In this context, reference analysis provides one of the most successful general 
methods to derive default prior distributions. For a recent and informative review 
see Bernardo (2005). While the algorithmic complexity for the construction of ref- 
erence priors can be substantial, it is known that suitable re-parametrizations of the 



2 



model may considerably simplify the task, see for instance Consonni et al. (2004) 
and Consonni and Leucari (2006). 

We address two specific issues in this paper: identifying alternative parametriza- 
tions for a given discrete graphical model, and constructing the corresponding refer- 
ence priors. More precisely, in ^ we consider several parametrizations: conditional 
probabilities of clique-residuals given separators, as well as generalized log-odds ratios 
that arise as canonical parameters of equivalent exponential family representations of 
the underlying sampling distribution, and explicate their mutual relationships. In §3] 
we provide the expressions for the corresponding reference priors, and discuss their 
main properties. In §4 we present a parametrization associated to a cut in the graph- 
ical model and derive the corresponding reference prior. Some points for discussion 
are summarized in the last section. Technical details for the proof of the relationships 
between various parametrizations are given in the Appendix. 

2 Generalized log-odds-ratios parametrizations 
2.1 Preliminaries 

Let us recall some basic facts about undirected graphs and graphical models: for 
further details the reader is referred to Lauritzen (1996, ch. 2). An undirected graph 
G is a pair (V, E) where V is a. finite set of vertices and E the set of edges, an edge 
being an unordered pair {7,5}, 7 G l^, 5 G V,7 7^ 5. Henceforth the graph G is 
assumed to be decomposable. For a given ordering Ci, . . . , of the cliques, we will 
use the following notation 

if, = u5=iC,-,/ = l,...,fc, Si = Hi_^nCul = 2,...,k, Ri = Ci\Si, l = 2,...,k. 

A given ordering of the cliques is said to be perfect if for any / > 1 there is an z < / 
such that Si C d. When we have a perfect ordering of the cliques, the Si,l = 2, . . . , k 
are minimal separators. The Hi and Ri are called respectively, the l-th history and 
Z-th residual. 
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A graphical model, Markov with respect to a given graph G, is a family of proba- 
bility distributions on (X^, j & V) such that Xs is independent of given Xv\{s,'y} 
whenever {^,6} is not in E. 

In this paper we shall focus on contingency tables arising from the classification 
of N units according to a finite set V of criteria, see Lauritzen (1996, Ch. 4). Each 
criterion is represented by a variable X^, 7 G V, which takes values in a finite set I^. 
Let T = x^gyXy. The cells of the table are the elements 

i = (i^, eV), iel. (2.1) 

Each of N individuals falls into cell i independently with a probability p{i); we let 
P — (p(0' ^ £ 2r), with Z^igxP(0 — 1- Furthermore, we write n{i) for the i-th 
cell-count and n = {n{i), i El), with I]jgin(i) = N. 

We consider here the model A4g, which, for a given G and a given integer N, is the 
set of multinomial A4{N,p) distributions with = Z]iex^(^) P — (p(0> i ^ 1) 
in the \I\ — 1 dimensional simplex, which are Markov with respect to G. 

From now on, we adopt the notation "15 Cq \/" to mean that D may be the 
empty set while "D C l^" excludes the empty set. Let £ denote the power set of V , 
excluding the empty set, i.e. 

£ = {F QV,F ^%}. 

For De£, 

in ^ {ij, 7 e -D), and n(iD), in ^ Id ^ Xjeolj (2.2) 

denotes a cell in the D-marginal table, and its corresponding count. We therefore 
have 

niio)^ nij)^ J2 ri{iD,jv\D)- (2.3) 

Note that 71(^0) — N. For F, D in we use the notation p^{iD) and p^l'^(i£)) to 
denote, respectively, the marginal and the conditional probabilities 

P''{^D) = E Pij) (2-4) 

j&'I\jD=iD 
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Assuming that "0" indicates one of the levels for each variable, we let i* denote 
the "0" -level in X^, so that 

t* = (2;, 7 e 

denotes the cell with all components equal to 0. 

Definition 2.1 For D E S, we define 

TD = {^D\^,7^^;,y7eD}. (2.6) 

In words, is the set of marginal cells such that none of their components is equal 
to 0. We set Jy = X*. For example, ii D = {a, b, c}, a takes the values {0, 1, 2, 3}, b 
takes the values {0, 1, 2}, c takes the values {0, 1}, then 

I}, = {(1,1,1), (2, 1,1), (3, 1,1), (1,2,1), (2, 2,1), (3, 2,1)} 
2.2 The saturated case 

We assume here that G is complete and Aic is therefore the saturated multinomial 
model for n = {n{i),i E I)) . The multinomial probability function is usually written 
in terms of the cell probabilities p = {p{i),i G X) as 

/(n|p) = — ^n^^r^^ (2.7) 

where the only restriction on the parameters p{i) is Y^iP{i) = 1. It is convenient to 
regard the multinomial coefficient in (12.71) as being part of the dominating measure, 
so that the actual density is simply nigxP(^)"^*^- Assuming that all probabilities are 
positive, the density (12. 7p . with respect to a suitable dominating measure, can be 
represented in exponential family form as 

np(0"^*)=exp| nmt)-Nlog{l+ ^ e««) I , (2.8) 
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where 

^^^) = ^°^^' ^'-'^ 
are the usual log-odds, relative to the benchmark cell i*. We recognize in (12. 8p a 
natural exponential family (henceforth abbreviated NEF), with canonical parameters 
^{i) and canonical statistics n{i), i ^ i* . For a review of NEFs, see e.g. Kotz, 
Balakrishnan, and Johnson (2000, ch. 54). 

In this paper, we shall work with NEF-representations alternative to (12.80 . fea- 
turing different canonical statistics and their corresponding canonical parametriza- 
tions, the latter representing various generalized log-odds-ratios of joint probabilities, 
residual-conditional probabilities or clique-marginal probabilities. For the saturated 
model, we need consider only the generalized log-odds-ratio of model probabilities 
defined as follows. 

Definition 2.2 For all D V and id G we define the log-linear parameters 

9{^n) = log n p{^F,^*v\pY-'^'"'"\ (2.10) 

FCqD 

Note that for F = 0, p^ip, iv\F) = and 6'(i0) = 9{i*) = \ogp{i*). The parameters 
6{i*) and p{i*) are not free but functions of the other 6 ot p parameters. We also 
emphasize the fact that while 6 (in) is indexed by the marginal cell io, it is a. function 
of the joint probabilities p{i) in the full table. 
Making the change of variables 

[n{t),tel\{t*}) ^ {n{iD),D<Z V,tDeTD), 

it is relatively easy to show the following expression of the multinomial distribution. 

Proposition 2.1 The NEF-representation of the saturated multinomial model in 
terms of the log -linear parameters Oiio) is given by 

X{p{i)-^^ = exp I ^ ^ n{tn)e{iD) - N\og{l + ^ E ^xp E ^(^f)) 1 .(2.11) 
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We remark that the canonical parameter 6{io) in fl2.1UI) . D C V, is defined only 
for z/) G X^, i.e. all those cell-configurations having no component equal to 0; alter- 
natively the remaining components indexed hj id G X \ XJj may be regarded as being 
set to zero in fl2.1ip . and thus satisfy the usual "corner constraint" used for instance 
in GLIM. Furthermore, the canonical statistics n{iD) represent the marginal counts 
for all cells in, D <ZV and io G X^. 

2.3 The case for G decomposable 

If the multinomial model is Markov with respect to a given decomposable, non com- 
plete, graph G, it is a simple consequence of the Hammersley-Clifford theorem (see 
Lauritzen, 1996, p. 36 and Liu and Massam, 2007) that the model is Markov with 
respect to G if and only if for G X^, D CV 

0(^0) = whenever D is not complete in G. (2-12) 

The model f l2.1ip satisfying fl2.12p as the multinomial model Aic Markov with respect 
to G. More briefiy, we refer to it as the multinomial Markov model. 
For any subset A CV of the vertex set, define 

= {D CA\D is complete} (2.13) 
= {DCoA \D is complete}. (2.14) 

To simplify notation, we will write V for T>^ . We are going to present in this subsec- 
tion three parametrizations for Aic- 

The first parametrization is in terms of the log-linear parameters defined in (12.1 OP 
with canonical parameter 

Qmod ^ ^(p) ^ e I), ZD G Ih) (2.15) 

and corresponding canonical statistic 

n(V) = {n{iD), DeV,iDe 11). (2.16) 



7 



It will also be convenient to use the notation 

mv^)) = iog(i + E y: E ^(^^)) (2-17) 

DCAio&I^ FCD 

for the cumulant generating function, and the notation 

(^(P^),n(I)^))= E e{^o)n{^n). (2.18) 

for the inner product. The NEF representation in terms of ^("P) can then be imme- 
diately derived from fl2.11l) as follows. 

Proposition 2.2 Let G be a decomposable graph. The NEF-representation of the 
multinomial Markov model in terms of the parametrization 6"^°'^ is given by 

exp{{e{V), n{V)) - N k{9{V))}. (2.19) 

Let us now introduce a second parametrization which is relative to the marginal 
distribution for Ci and the conditional distributions for Ri given Si. For a given 
perfect ordering Ci,...,Cfc of the cliques of G, the Markov property implies (see 
Lauritzen, 1996, p. 90) 

pW ^ (2.20) 

As a consequence we can write the multinomial density (12.71) as 

np(')""' = n (#wt)'"" = n (p^h^c) xip•"^H^S" 

iex iei\ni=2P 'yt'Si) J iex\ 1=2 I 

«Ci eXci '=2 iCi &1Ci 

= n (p^H^cj)"(-^)n n n {p'^^'H^R.)T^'"'^ ■ (2.21) 

iCi eXcj '=2 iSi &1Si iRi &^Ri 

Note that fl2.2ip expresses the multinomial Markov model in terms of the marginal 
probabilities in the Ci-table, as well as the conditional probabilities in the 25;-slice of 
the Ri-table, ioi I = 2, . . . , k. 

Formally, for B G V and A G V with A H B = the i^-slice of the A-table is 
obtained by classifying, according to the factors in A, only those units that belong 
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to the marginal i ^-cell (for the notion of "shce" in a contingency table see Lauritzen, 
1996, p. 68). 

Let us now define the log-linear parameters corresponding to the factorization 

dm]). 

Definition 2.3 For each clique Ci, I = 1, . . . , k, we define 

e^'{zD) = log n (p^'(^f,'^caf)^^'^'"'"', DCQ, iDeTj,. (2.22) 
Definition 2.4 For each residual Ri, I = 2, . . . , k, and fixed isi € Ts,, we define 

9^'\'^^{zn)=log n (p'''''"'(^F,^Mi^))^"'^'"'"', DCRi, inelh- (2.23) 

FCoD 

Note that both 6'"'{iD) and O^^^^^i^io) are "marginal" parameters, in the sense that 
they are functions of probabilities in the C/-marginal table. 

For any A<ZV,B(ZV,B^A = ^ and any fixed is & Ib, also introduce the 
notation 

eiv""^) = (e'^^itD), Dcci, tDeih), (2.24) 

n(D^O = in{tD), DCCu id el},), 

representing the log-linear parameters and cell-counts for the clique-Ci-table. Fur- 
thermore we will use 

e{tB,V^) = {9''\^n^D),DCA,tDeIh), (2.25) 
n{iB,V^) = {n{iB,iD), DC A, in e T^), 

to represent the log-linear parameters and the cell-counts respectively in the i^-slice 
of the A-table. 

We collect together the elements of fl2.24p and fl2.25p in a single parameter that 
we call e""""^ 

= (^(P^i), eits^V""'), t,^elsj = 2,...,k). (2.26) 
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Correspondingly we define the following canonical statistics 

n-"'^=(n(D^^), nfe,!^""'), ts, e IsJ = 2, . . . ,k). (2.27) 

Since Ci and Ri,l = 2, . . . ,k are complete, we can apply Proposition 12.11 to each 
of the Ci-marginal and -Rj-conditional multinomials in the is^-slice of (12.211) . We have 
the following lemma as an immediate consequence of Proposition 12. 1[ 

Lemma 2.1 The NEF-representation, in terms of the parametrization 6''^°"°', 

• of the marginal Ci-model is given by 

n (P^'fc))"^'^^^ = exp{{e{V^'),n{V^')) -N k{e{V'^'))}.{2.28) 

• of the conditional Ri-model in the isi-slice is given by 

-n(zsJM^fe,^^''0)}(2.29) 

Note that the number of parameters in 9'^°'^ and ^^""^ is of course the same. Indeed 
each element of each one of the two parametrizations is indexed hy idiD ^V^in & 
either directly as for 0'^°'^^ or through the components ipiF C Si, ip G Tp and 
iD^D C Ri,i^ el}) as for 

Since the clique marginal generalized log-odds ratios are also of interest, we are 
now going to define a third parametrization of the multinomial model in terms of the 
generalized log-odds ratios in (I2.22p . Any marginal cell isi can be written as 

= {iF,ist\F) 
where F e S*^ . Accordingly, we define 

e{vl\v''^) = (0^K^f,^d), D c i?,, e To, F Co Si, IF e Tp) 

niV^^V""') = in{tp,tD), D C Ri, e J^, F Cq Si, ip G Tp) 
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and 

Qcuq ^ {e{V^'), d{V^',V^"), 1 = 2,. ..,k). (2.30) 
n'^^'i = (n(P^i), n{V^\V^'), l = 2,...,k). (2.31) 

We note that for F = 0, 0^^{ip,i]j) = 6'~^'-{i£,) and niip^io) = niir,). Clearly the 
number of parameters in 6''^'"^ is the same as in 0'^°^'^, 

The expression of the density in terms of this new parametrization will be given in 
the next section, after we have derived the relationship between the three parametriza- 
tions (mSD, dlSni) and (JOD]). 

2.4 Relationship between the various 9 parametrizations 

The relationship between the three 9 parametrizations is given in the following propo- 
sition. To state the results succinctly, let us also define, for any F <ZV and ip & Ip, 

icoF = {ic, G Co F} . 
Then for given F CV and ip & Ip, and A C V such that F fl A = 0, we also define 

9{zc^oF,V^) = mG,jL), GC^F, LCA, jL e ID (2.32) 

and 

k{9{zcoF,V^)) = log{l+J2exp Yl 0i^K,jH)). (2.33) 

LCA kCqF 

HCL, 

We note that for any I = 2, . . . , k, and F C Si, 

9{tc,F,V^')c9{V',\V^^). 

Proposition 2.3 Let in e and D CCi,DnRi^^. Then 
a) the relationship between ^"^'"^ and 9'^""-'^ is 

9^^{in)= E (-l)l(^^^')\^l e''''^^""^'\-^(zDni?J (2.34) 
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which, for D C Ri, is equivalent to 

^«H(^..Sa.)(,^) = Y: O'^^itcto) (2.35) 
gcqF 

b) The relationship between 6'^'*'^ and 9"^°'^ is as follows. Let {> /} denote the set 
ofje{l + l,...,k} such that Q f] Cj ^ 0. 
(i) For D ^ Sj, for some j G {> /}, 

e{tD) = e'''itD). (2.36) 

(a) For D C Sm,m e {> 1} 

9i^D) = ^^K^d)- E (-l)"'^^'M^(^CoF,I^^>0) (2.37) 

FCoD 

where C^i = Um>/(Cm \ Ci) and k{9{icgF,V'-^>'-)) is defined as in l{2.33\) . 

Moreover, all Oiiujic) ^ ^(^cqF, ^^'"^0 o^re such that H U G C Cm for some 
m G {> /} and is therefore either equal to 6'""' {in, jo) or can be expressed in terms 
of 9^-iiE),m e {> I}, E C Cm,tE e I*E- 

The proofs of (I2.34p and fl2.35p can easily be derived from Definitions 12.31 and 12.41 
The proof of fl2.37l) . though, is not immediate and is interesting. It is given in the 
appendix. 

Remarks. 1. Expression fl2.35l) is a generahzation of the relationship between con- 
ditional and marginal log-odds ratios for a three way table given in Agresti (2002, p. 
322). 

2. According to fl2.37p . 9{iD) is a function of 9^"'{jH) such that H C Cm for m > Z 
only. This is going to be a crucial fact when we derive the reference prior of 
from the reference prior on 9'^°^'^ in the next section. 

Relation (12.371) is crucial for the derivation of the reference prior for 6''^'*^ in the 
next section and we therefore illustrate it here with an example. 

Example 2.1 Consider a decomposable graphical model with the following perfect 
order of the cliques 

Ci = {a, b, c}, C2 = {b, c, d}, C3 = {c, d, e}, C4 = {e, /}, 
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having separators 

S2 = {b,c}, S3 = {c,d}, S4 = {e}. 

To simplify matters, let us assume the data are binary. In this case we can simplify 
the notation since, because of the corner constraint conditions (see end of ^2.2), 
contains only one element for each D. Thus 6{iD) can more simply be written 0{D). 
Let us take D = {c,d}. We see that D C C2 and D H R2 = {d} 7^ 0. Moreover 



C>2 = {e, /} and the set of L C C>2 is equal to {e, /, e/}. Then according to ^2.37\ ), 
it follows that 

g(^^^ ^ 6''^2(cd) - log(l + eS(e)+e{ec)+0{ed)+e{ecd) ^e(f) ^8{e)+e{ec)+e{ed)+e{ecd)+e{f)+e{ef)^ 

+ log(l + 6^^''^+^^^'^^ + e^^^) + e^(^)+^(^'^)+^(^)+''(^/)) 
+ log(l + e^('^)+^(^'') + e^(^) + e^(^)+^('='=)+''(/)+^('=-^)) 
- log(l + e^(^) + e^(-^) + e^(-)+^(/)+^{^/)) 

Since 

e{ec) = e^-'{ec), e{ed) = e^^ied), e{ecd) = e^^iecd), e{ef) = e^^ef), e{f) = e^\f), 
and according to ^2.31^ again, 

e{e) = ^^3(e)+log(l+e^(^))-log(l+e^(^)+^('=^)) = ^^^(e)+log(l+e^''*(^))-log(l+e^'^^(^)+^''*(^^); 

we see that 6{cd) can be expressed in terms of 9^"^{E),m >2,EC Cm- 

We will now give the expression of the multinomial Markov model with respect 
to 6''^'*'^, using relation fl^ 



Lemma 2.2 Let G be a decomposable graph with its cliques Ci, . . . ,Ck arranged in 
a perfect order. The NEF-representation of the multinomial Markov model in terms 
of the parametrization is given by 

1=2 

-T. Y: <3f) E (-1)I^\^IM^(JcoH,I^^0)} (2.38) 
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From (12.381) . it appears that under the multinomial Markov model, the joint distri- 
bution of n'^''^'^ admits a conditional reducibility structure, see Consonni and Veronese 
(2001); specifically, it factorizes into the product of k conditional exponential fami- 
lies (save for the first term which is a marginal distribution), in a recursive fashion 
according to the clique ordering. 

3 Reference priors 

In this section we shall derive reference priors for the various parametrizations intro- 
duced in section 2. We shall only provide an outline of the proofs of the derivation of 
our reference priors since they follow the steps described in §2 and §4.2.1 of Consonni 
et al. (2004). An important point to keep in mind is that a reference prior for a 
multidimensional parameter depends on the grouping of its components, as well as 
the ordering of its groups: specifically we order groups according to inferential im- 
portance, while parameter-components that belong to the same group are treated in 
a symmetric fashion. For the parametrizations considered in this paper, order will 
not matter, and thus the reference prior will only depend on the grouping-structure. 

For a given graph G, let Ci,. . . ,Ck represent a perfect ordering of the cliques. 
We will first consider the reference prior for the collection of conditional probabilities 
(including the marginal probabilities for clique Ci), p^""*^ as in (I2.2ip 

pcond ^ ^pC,^pR,\is,^ els,, 1 = 2,..., k), (3.1) 

where 

pGi = [p^^itcj, ic,elc,) (3.2) 

pRA^s, = (p^'l*«.(z«J, iR,elR,), (3.3) 

represent the collection of groups. Note that there are 1 + Zlf=2 I'^Si I groups. We 
remark that the nature of the parametrization p'^""-'^ depends on the specific choice of 
the perfect numbering of the cliques Ci, . . . , C^. 
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Next we will consider the reference priors for 0'^°'^'^^ QcUq^ Qmod following a parallel 
grouping-structure. We shall see that all these reference priors are strictly related, so 
that a unified expression for all of them is possible. 

Proposition 3.1 The reference prior for relative to the grouping defined in 

/ TO) IS 

oc ( n pfc))'^n n ( n ^^^^'^M^i^j)"'- (3.4) 

We note that the reference prior is a product of Jeffreys' priors, one for each of 
the groups of 

Proof: In our setting, we simply need to derive the (Fisher) information matrix. 
From (12.211) it appears that the likelihood function factorizes into the product of 
terms, each involving exactly one group of p^""-"^; furthermore each term is a saturated 
multinomial. Accordingly the information matrix is block-diagonal, and the determi- 
nant of each block, using classic results, is easily available. Specifically the first one, 
corresponding to clique Ci, is given by 

^( n pfc))"\ (3.5) 

while for the remaining blocks the determinant is 

E{n{tsM{ n ts,eXsJ = 2,...,k. (3.6) 

Because of the perfect ordering the cliques. Si C Cj for some j < /, so that the 
expected value E{n{isi)\p) is a function of parameters only belonging to groups pre- 
ceding the /-th one. 

Following the theory summarized in Consonni et al. (2004, sect. 2), the reference 
prior is given by the square root of the product of the block-determinants, excluding 
the terms E{n{isi)\p), the result is established. □ 

We now emphasize three properties of the reference prior for p^""^*^. First of all, 
since the information matrix is block-diagonal, the reference prior is order-invariant, 
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i.e. it does not depend on the order of the groups. Secondly, we remark that there 
exists also some degree of invariance with respect to grouping. Specifically, if if we 
lumped together in one single block all the isi terms p^'l^'S;, i^^ g Is^ ^^e reference 
prior would not change. This feature will turn out to be useful later on when deriving 
reference priors for alternative parametrizations. 

Third, we remark that the distribution vr^ond belongs to a family conjugate to the 
likelihood for see ( 12.211) . Accordingly its hyper-parameters can be interpreted 

in terms of "prior counts" ; the latter however cannot be recovered as the margins of 
an fictitious overall table. Indeed, each cell in the Ci-table, as well as in the isi slice 
of the i?/-table, has a prior count equal to 1/2, irrespective of the dimension of the 
subtables and of the overall table. Finally, the prior is proper, since it is a product 
of Dirichlet priors, one for each block, each Dirichlet being indexed by a vector of 
hyper-parameters with entries all equal to 1/2. 

We now turn to the derivation of the reference priors for the three 6 parametriza- 
tions described in §2J Central to our arguments below is the following basic fact 
about reference priors that we shall use repeatedly. Let A be a parameter grouped 
into components A = (Ai, . . . , A^), where Aj is typically a vector. We assume that 
the above groups are arranged in increasing order of inferential importance. Let 
(p = (01, . . . , (pk) be a reparametrization, i.e. a one-to-one function of A with (pi hav- 
ing the same dimension as Aj. Suppose that, for each I = 1, . . . , k, (pi = hi{Xi, . . . , A^), 
for some function hf. we say in this case that the map A ^— is block-lower triangular. 
Then the reference prior for (p can be obtained from that of A simply by a change 
of variable. For details and references see again Consonni et al. (2004, §4.2.1) An 
important special case occurs when (pi = hi{Xi): in this case we say that the map is 
block-wise one-to-one. 

We start by expressing p™"'^ in terms of the In order to achieve this goal, it 

is convenient to define the parameters 

^K^d) = E ^""H^f), iDETn (3.7) 

FCD 
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e'\''{^F.^D) = ^ ^^'l(^-^^A-)(^^) (3.8) 

LCD 

= E ^'''i^H,tL), tF eT*F,tL ell DC Ri (3.9) 

LCDHCqF 



We let 



where 



^ = {f\ FCSi,tFeX*F, 1 = 2,. ..,k) (3.10) 

= {^'''i^D), Dec,) 

The mapping between p™"'' and C, is block-wise one-to-one. As a consequence the 
reference prior on ^ can be deduced from that of as 

vrf (0 = <_.(p™"'^(0)| V-(OI (3.11) 

where Jpcond{C,) is the Jacobian of the transformation i-^- ^. It can be verified 
that 

^ =( n pfc)r^))n n ( n j>^''"'fe)(e^''"o) (3.12) 

so that the induced reference prior for is 

vrf(o«( n M^cjr^))"^^'n n ( n p^^^M^i^jr^'^^o)"'^' (3.13) 

We shall also need the following result which can be easily derived from Definitions 
(12. 3p and (12. 4p and Moebius inversion formula. 

Lemma 3.1 For ic^ = (ip^il^^^^p), 

P«fc) - 7^"'"-' , . (3.14) 

1 + EhcCi EjHei'^ exp [jh) 

For is, and in, = {ic, i%\G) 9iven, 



^ . (3.15) 

1 + EhcRi Ej^er^ exp ' (j^^) 
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As particular cases, we have 

p'''i^h) = T-^ ^^^^ ^^-^^^ 

and 

= (3-17) 

1 + Ehcr, exp ' [jh) 

Since the reference priors of the three 6'-parametrizations are structurally equiva- 
lent we shall provide the result in a unified statement. 

Theorem 3.1 The reference prior for 

a) relaUve to the grouping defined in Ii2.26\) 

b) 0'^'"', relative to the grouping defined in Ii2.30\) 

c) relative to the following grouping 

e""^ = {e{tD), D c Ci, to G JB), = {e{iD), d c q, on Ri^(Is)j = 2,..., k .{3.18) 

is proportional to 

( n pfc)(-))'n n ( n p'''^''^Mi-)y\ (3.19) 

iCi Silci '=2 iSi &^Si iRi &1ri 

where the probabilities p{ic^)(-) and p^''^'^^i(iji^)(-) are understood to be expressed in 
terms of the relevant 9 -parametrization, using 1^3. 14^ - ^3. 17^ together with i) i3. ?| j-( f5'. 9\) 



for e^"""^; ii) (Egj for 6^^'^ . m) / f03]) . / fO^) and ([O^j for 9"''"^. 
More explicitly, the reference prior 

• for 0^°'^'^ is given by the product of Ii2.28\) and Ii2.29\) . with the understanding 
that the counts in these formulas are replaced by fictitious prior counts which 
we write as n{iD),N and so on. More precisely, we have 

'^Ci\d\ ,7 _ l^cj 



and 

-,. . N \^Ri\d\ X \T'Ri 
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for 6''^'*'^ is given by h2. 31^} where for I 



nyiD) = ^ — O'l^d i\ = 



and for I = 2, . . . , k 



2 '•'^'2 

• for 6"^°'^ can be obtained from that of 6'^^^'^ above by expressing it in terms of 
e{V) using / fOgP and ( fOTp . 

Proof: a) Because of (13. 7p and f l3.8p it is immediate to verify that the map C, ^ Q'^"^^ 
is block-wise one-to-one; moreover the Jacobian is equal to one. Accordingly the 
reference prior for 0^°"°' will be exactly as that for ^, with the only difference that the 
probabilities involved will be expressed as functions of 0'^°'"-'^. 

b) Similarly to what happened for the reference prior for p™"'^^ the reference prior 
for is unchanged if, for each I = 2, . . . ,k, we lump together the groups labeled 
hj isi G thus only regard as made up of k groups. In this way the 
transformation from 0'^°^'^ to 6'^''^'^ is block-wise one-to-one, and thus the reference 
prior for 6'^'*'' is equal to that induced from the reference prior 6'™"'^. Moreover, the 
transformation is linear so that the Jacobian is constant, and thus the result follows. 

c) We see that the groupings in fl3.18p are exactly parallel to those in 0'^'-^'^. From 
fl2.37p we also see that the l-th group in 6"^°'^ is a function of the subsequent /, / + 
1, . . . ,k groups in O'^^^i, This defines a block-upper triangular transformation, which 
can be turned into a block-lower triangular one by reversing the order of the groups in 
O'^^'^i^ Since the reference prior on ^^^'^"^ is invariant to group-ordering, we conclude that 
the reference prior on can be obtained from that of 6''^'*'^ by a change-of- variable. 
Now notice that the Jacobian is 1, again using (I2.37p . so that the result is proved. 
Finally, the expressions of the fictitious counts are derived by inspection. 

□ 

We remark that, similarly to what happened for p™"'^^ the reference prior for each 
of the three ^-parametrizations is also a conjugate prior and is proper, being the 
transformation of a proper prior on p™""^. 
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4 Parametrizations and reference priors associated 
to a cut 



The reference priors obtained in the previous section were based on a grouping of the 
parameters defined by the structure of the graph, essentially through a perfect order 
of the cliques (and consequently of residuals and separators). 

Now suppose we are interested in a particular subset A CV of the variables, and 
that we would like to consider a reference prior which groups together precisely the 
parameters of the marginal distribution referring to A. We show in this section how 
this can be done if the Markov model AIg is collapsible onto A, equivalently if A 
represents a cut for the joint distribution. 

Asmussen and Edwards (1983) consider the concept of coUapsibility for contin- 
gency tables. If the set of factors for the table are indexed by 7 e F and if A CV, we 
say that G is collapsible onto A if the multinomial model M.Ga^ Markov with respect 
to the induced subgraph Ga, is the same as the model obtained by marginalising the 
given model Aic, Markov with respect to G, over the A-table. 

Frydenberg (1990, Theorem 5.4) has shown that the model for the random vector 
y , Markov with respect to G, is collapsible onto A if and only if the sub- vector is a 
cut (for simplicity we shall also say that A is a cut) . Cuts in exponential families have 
been introduced in Barndorff-Nielsen (1978) and studied in several further articles 
such as Barndorff-Nielsen and Koudou (1995). 

A very useful result, due to Asmussen and Edwards (1983), is that A will induce a 
cut if and only if the boundary of every connected component ofV\A has a complete 
boundary in G. 

The following lemma gives us the factorization of TWg with respect to the cut A 
and the connected components of Gv\a- 

Lemma 4.1 Let A be a cut. Let G[, . . . , C'^ be a perfect ordering of the cliques ofGA, 
the graph induced by A. Let Bi,l = l,...,p be the connected components of Gv\a- 
Let Gpj = 1, . . . ,mi be the cliques of the induced graph Gbiubbi, I = 1, ■ ■ ■ The 
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multinomial model Aic, Markov with respect to G, can he factorized as follows 



P 

1=1 



(n n(i ,n) -R'^'I* fn "(« in) 

n n (p^'^'"''-Mv«NaB,)) n n n(p ■ (vo) °" 



(4.1) 



Proof: For simplicity of exposition, some statements concerning the random variables 
associated to a set, will be simply stated in terms of the set itself. If A is a cut, A 
separates the connected components of V\A; by Theorem 2.8 of Dawid and Lauritzen 
(1993), this implies that the -B;'s are mutually conditionally independent given A. 
Moreover since A is a cut, the boundary of Bi is a complete subset of A and, of 
course, it separates Bi from V\ (BiUdBi). Therefore the overall multinomial Markov 
model factorizes as the product of the A-marginal multinomial model, Markov with 
respect to M.Gai ^^^d ^^e product of the conditional multinomial distributions of the 
S;s given iQBn ^ = 1, • • • 

Since the the marginal model for A is Markov with respect to the graph Ga-, it 
factorizes according to a perfect order of the cliques of Ga, in parallel to what was 
done in ^ this proves the first line of (14. ip . 

Let us now consider the expression for the second line of (14.11) . As recalled above, 
this is given by the product of the conditional multinomial models for 5^, / = 1, . . . ,p 
given iQBr For any / G {1, . . . as a subgraph of G, the induced graph GsiudBi is 
decomposable. Moreover the marginal model for Bi U dBi is Markov w.r.t. GbiuSBi- 
This happens because Bi U dBi is itself a cut, since the boundary of each connected 
component of Gv\{BiudBi) clearly belongs to dBi which is complete. Therefore the 
marginal distribution A^Csjuas, factorizes according to a perfect order of the cliques 
of Gbiusbi- Since dBi is complete, it must belong to a clique of Gbiubbi and by 
Proposition 2.29 of Lauritzen (1996), we know that we can take this clique as the first 
in a perfect order Cf\i = 1, . . . , m/ of the cliques of GsiudBr 
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10 



11 



Figure 1: The decomposable graph for Example 4.1 



The marginal multinomial distribution A^Gs^uas; '^^^ therefore be written as 



mi 



n liiP ^ M) ^ 



nil 



3 



-,(0 



nn n(p 

J=2 «<,(!) yo 

3 J 



r^Ph 



.(0 



i(i (O) 



n 



V 



'c['\aBi 



R^'hi (i\ n(i (i\) 

X nnn^^ - ^ 



J=2 



and therefore the model for Bi conditional on igsi is equal to 



n (P<^"'^»"'-(M..,„,)) °" n n ' '"(v>) °" • (4-2) 



3 J 



Since this is true for all Bi, the result is established. 



□ 



Example 4.1 Suppose that the joint distribution of the 11 variables numbered con- 
secutively from 1 to 11 is Markov with respect to the decomposable graph G as given 
in Fig. Ql 
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Consider the subset of variables given by A = {1,2,3,4}. A perfect ordering of 
the cliques of the induced sub-graph Ga is 



Ci = {l,2}, C^ = {2,3}, C^ = {3,4}, 



(4.3) 



so that 5*2 = {2}, 5*3 = {3},i?2 = {3}, R'^ = {4}. The connected components Bi of 
Gv\A, their boundary dBi together with the cliques Cj of Gbiubbi are 



I 


Bi 


dBi 


Bi U dBi 


Cf 


1 


{9,10,11} 


{2} 


{2,9,10,11} 


Cf) = {2,9,10}, ^('^ = {10,11} 


2 


[8] 


{2,3} 


{2,3,8} 


Cf) = {2,3,8} 


3 


{5} 


{3} 


{3,5} 


Cf^ = {3,5} 


4 


{6,7} 


{3,4} 


{3,4,6,7} 


Cf) = {3,4,6,7} 



A graphical display of Ga and its connected components is given in Fig. [B 

Accordingly, the multinomial model, Markov with respect to G, can be factorized 
using Lemma \4-l\ as 



\n(i) 



iei 



'2 l« (1) 



cf\aB2\i3B, 



l(V(2)) 



n n (p 



We now provide the expression for the reference prior associated to a cut. 
Theorem 4.1 Let A be a cut and consider the parametrization associated to A 

cut / cond I, cond \ 

Pa — \Pa ■:Pv\a\a)i 
23 
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Figure 2: The decomposable graph Ga associated to a cut A and the connected 
components of Gv\a for Example 4.1 
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where 

Pa 



= (p^Sp^'^',/ = l,...,g,z,;G J5;) (4.4) 

Pv\A\A = iP'^'^''^',^dBteIaBi;p ^l = l,...,p, 

j = 2, . . . ,mi, igii) (4.5) 



using the notation presented in Lemma \4.1\ The reference prior for p'^^^ , relative to 
the grouping ( [^.^p and ( [^.5| j, is 



^UpT) « n p^'^i^co~'^f[ n n (^''''^''(^h;))-^ 



P 

X n 

1=1 



We emphasize that, also for this case, the prior admits a conjugate structure and is 
proper, being a product of Jeffreys' priors. 

Proof: Using Lemma HIT] the likelihood factorizes into a product of two general terms, 
one related to the marginal distribution of A indexed by p^™"'^, the other related to 
the conditional distribution of V \ A given A indexed by Pv^^iA- "^^^ groups of 
parameters are variation and likelihood independent, so that the information matrix 
is two-block-diagonal. The marginal distribution related to A is a G^-Markov model, 
with Ga decomposable, and therefore the corresponding reference prior is exactly as 
in the general decomposable case of Proposition 13. 1[ This yields the first line of the 
kernel of the reference prior 

To prove the second line, we have to consider the second block of the infor- 
mation matrix. This actually further decomposes into p diagonal blocks, one for 
each connected component Bi . Consider the block corresponding to the model 
for Bi conditional on dBi,l = l,...,p (see f l4.2l) ). Each block represents the in- 
formation of a saturated conditional multinomial. In particular the first term has 
cell-probabilities p'"'^ \9Bi\igB^ n{igBi) trials, while the remaining terms have cell- 

j I (0 

probabilities p j and n{i (i)) trials. The expression of the corresponding term in 
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the information matrix will therefore be as in the general conditional saturated multi- 
nomial, see (13.61) . Finally, the expectation of n{iQBi) depends only on the parameter 
p'^^ond gjj^j^g Q-Q^ ^ similarly the expectation of n{i^(i)) does not depend on the 

parameter p j specific to the component because of the perfect ordering of the 
cliques. Therefore, in both cases the term corresponding to the expectation factors 
out of the determinant and the proof is complete. □ 



5 Discussion 

In this paper we have considered several alternative parametrizations for discrete de- 
composable graphical models. First of all we have described a parametrization in 
terms of conditional cell-probabilities. Next we have derived three alternative rep- 
resentations in terms of natural exponential families, whose canonical parameters 
represent generalized log-odds ratios relative to suitable cell-probabilities. Specifi- 
cally, 9'"^°'^ refers to the joint probabilities of the full table and has been previously 
used, see e.g. Dellaportas and Forster (1999), Dellaportas and Tarantola (2005) and 
Liu and Massam (2007) but we think that our derivation and interpretation makes 
its interpretation clearer. The parametrizations 0^°^'^ and Q'^^^i'^^^ on the other hand, 
refer to marginal sub-tables and are quite distinct from those traditionally employed 
in graphical log-linear modelling. Indeed they are rather related to the concept of 
marginal models, see e.g. Bergsma and Rudas (2002), Lang and Agresti (1994) and 
Glonek and McCuUagh (1995). 

A reference prior for each of the above parametrizations was constructed. In 
particular the prior for the conditional cell-probabilities of the residuals given the 
separators is a product, of Jeffreys' priors. We showed that all reference priors are 
coherent, i.e. each is equivalent to any other one. This happens because the grouping 
structure is such that the transformation between any two parametrizations is either 
block-diagonal or block-lower triangular. A notable feature is that all reference priors 
are proper. Another property is that they belong to a conjugate family, which facili- 
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tates prior-to-posterior updating. The conjugacy feature is consistent with previous 
results, see Consonni et al. (2004), wherein reference priors for suitable parametriza- 
tion of NEFs having a simple quadratic variance (such as the saturated multinomial) 
were derived and shown to belong to (enriched) conjugate families. Our paper shows 
that this result continues to hold also for multinomial decomposable models, whose 
variance function is not quadratic. With hindsight, this is not surprising, because of 
the recursive factorization into products of conditional saturated multinomial models 
that holds when G is decomposable. We have also considered a parametrization, and 
the corresponding reference prior, associated to a cut. This can be especially useful 
whenever interest focuses on the parameters of a marginal table, e.g. because of their 
inferential interest. 

Throughout the paper we assumed a given graphical model, and constructed ref- 
erence priors essentially in view of estimation purposes. While we are aware that 
estimation-based priors should not be routinely used in model determination, we re- 
mark that our reference priors are conjugate (and thus decompose into local blocks 
precisely like the likelihood) and that they arc proper. These attractive features make 
them natural candidates also for model comparison, e.g. via Bayes factors, at least 
for a preliminary and informal evaluation. 
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A Appendix 

Proof of (1^:^ . Consider D C Q, D n Ri ^ ij) for some I e {1, . . . , k - 1} such that 
also D C Sj for some j G {> /}, then 

P^'i^o) = E Pi^D,jL)= E exp( E 0{ie)+ E 0iiE,jG)) 

= (exp E E exp( E di^E,jG] 

ECqD LCG^ EQoD,GQL,jG€l}. 

logp^'^iio) = E ^^(^i?) + log (l + E exp( E 

ECqD LCC^ ECoD,GCL,jG&lG 

e{lE) = logp^'(zc) -l0g(l+ E exp( E ^i^E^jG) 

ECqD LCCf ECoD,GCL,jG'^lG 

This last equality is of the form 

E m) = <PiD) (A.i) 

ECoD 

and therefore by Moebius inversion formula, we have 

= E • (A.2) 

FCoD 
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For / = 2, . . . , fc, let C^i = \ C/. Then (lA.2p can be written as 



= O'^'itn)- E (-l)l''\^llog(l+ E e^P( E 

FCqD LCCf HCoF,GCL,jG&G 



- E (-i)"'^^'iog(i+ f E + E + E lexp( E 0i^H,jG 

Jg'^^'g 

- ^ (-l)l^\^llog(l+ E exp( E di^H,jG))){l+ E exp( E ^(^^^'^G 

~ ~ GCL, ~ GCi, 

0'''{^D)- E (-l)"'^^'log(l+ E e^P( E 0i^H,jG)) 



GCL, 



G 



- E (-i)i^\^iiog(i+ E exp( E ^(^H,jG 

GCL, 
JG^^h 

We now want to show that the term 

E (-l)l^\''ll0g(l+ E e^P( E dilH.jG))) (A.3) 

GCL, 

in the equation above is equal to zero. 

Let F be an arbitrary subset of D and let / = F fl Hi_i. Since G ^ L (1 C^i, in 
order for 9{iH,jG)y H Cq F, G to be non zero, it is necessary that H I and 
therefore 

(l+ E exp( E 0{tjj,jG))) = {l+ E exp( E 0{zh,Jg))). (A.4) 

GCL, GCL, 

We see that the right hand side of ( 1A.4I) above is the same for all F D that 
have the same intersection I with We therefore consider all such F's. Since 

D n Ri ^ ^, there are as many such F's with \D \ F\ odd as there are with \D \ F\ 
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even and therefore from (lA.4p . it follows that, for a given /, 

E (-l)l^\^llog(l+ E exp( Oi^H,jG)))=0. (A.5) 

FnHi_i=I GCL, 

Since this is true for all / Cq D fi Hi_i, it follows immediately from (lA.Sp that (1A.3P 
is equal to zero and we have 

FCD LCCr^ix HQoF, 

^ ' GCL, 

jG^^h 

Since D C Sj H Ci for some m G {> /} and G C L C Cm \ Ci is non empty, in the 
right hand side of the equation above, we have that either 9{iH,jG) = ^""""(^h, Jg) or 
that OiiH^ic) can be expressed using (12.370 recursively and therefore OiiD) can be 
expressed in terms of 6'~"'"'{iE),m G {> 1},E C Cm, is ^ ^e- Formula f l2.37p is thus 
proved. 
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